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Abstract 

This paper reports an empirical study on the application of the immersion approach to English teaching in North 
China Institute of Science and Technology (NCIST) in line with the basic principles of the Canadian immersion 
teaching mode. The experiment result and survey shows that the students involved in the immersion course improve 
faster in productive skills (i.e. writing and speaking) than those in conventional education programs. The study 
reveals that, apart from gains of social and cultural knowledge, the immersion students seem to have developed a 
more positive attitude towards English study. It also suggests that the immersion approach to English teaching in 
colleges could be a feasible alternative to the traditional mode. 
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1. Introduction to the Immersion Teaching Model 

The term "immersion education" was born and came to prominence in Canada during the 1960s. Its basic theories 
and principles evolved from the popular theories of language acquisition and language learning. The immersion 
teaching mode is defined as the delivery of the second language curriculum in an immersed second language (SL) 
learning environment to learners who share the same first language. In this mode, the learners are "immersed" in the 
SL environment for all or part of the time and only the second language is allowed in the learning process. In this 
mode, the SL not only serves as the content of teaching but also a tool for language teaching and acquisition. 

According to statistics of the early nineties, as many as 30 million students have participated in immersion programs 
in various forms in Canada. Later, China, the United States and many other countries followed the same model and 
began their own practice. However, this teaching mode has obvious limitation: it is mainly designed for primary and 
secondary school students. Whether the principle of immersion teaching can be applied to adults or foreign language 
teaching in institutions of higher learning draws the attention of many language instructors and learners. 

The University of Utah conducted a similar experiment in immersion education in 1985-1986 (Sternfield, 1989). 
The subject of the experiment is one-year students who are native English speakers and major in Spanish. These 
students took Latin American Studies course for one-year. The course was taught entirely in Spain and the teaching 
mainly comprised Latin American history, geography, current events and so on. Language skill was no longer the 
content of teaching and all teaching materials were from up-to-date sources. Their experimental results are 
consistent with those of the Canadian immersion teaching model—the Spanish level of these students is either 
higher or not lower than that of those Spanish-speaking students under the traditional teaching model. 

After more than 40 years of practice around the world, the immersion teaching model has provided valuable 
experience to second language teaching and has already produced fruitful results. 

2. The Immersion Experiment in College English Teaching 

2.1 Introduction to the Experiment 

Enlightened by the immersion teaching model of Canada and the United States, the author intends to conduct a 
study on the application of such a model to college students. In China, college English teaching has aroused the 
interests and concerns of people from all walks of life and has achieved marked progress in terms of curriculum, 
teaching model, materials, teacher quality and so on. However, there still exist a lot of problems which serves as a 
bottleneck to further teaching reforms and improvement of the traditional model. Presently, how to create a sound 
environment for English communication and how to meet the needs of the society for competent communicators in 
English has become the major task of many English teachers. 

With this task in mind and led by a number of senior teachers with 20 years of teaching experience, some NCIST 
English teachers formed a work team and conducted an experiment: applying the immersion teaching model to 
students in this institution of higher learning. 
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2.2 The Experiment 

2.2.1 Object 

With a view to reversing the trend that most Chinese students gradually become accustomed to the 
Grammar-Translation approach of teaching and Chinese often become the working language in English lessons, we 
conduct this experiment to verify the feasibility of the immersion teaching model in college English teaching. In 
brief, our objective is to explore whether the immersion teaching mode can replace or supplement the traditional 
skill-centered English teaching model. 

2.2.2 Subjects 

We chose 70 students at random from 125 English major freshmen enrolled in September, 2006 and all of them have 
learned English in their secondary schools for 6 consecutive years. The average age of the 70 students is 18 and their 
overall learning and cognitive ability is almost the same. We divide these 70 students into two classes, each having 
35 regardless of sex. One is experimental group and the other is Control group. Each class is then divided into 5 
teams, each with a team leader. 

2.2.3 Hypothesis 

We have two hypotheses for the experiment: (a) Without consideration of academic knowledge derived from the 
courses, the development of linguistic ability of the experimental group is similar to the control group; (b) 
Regarding the practical ability of language use, the experimental group performs better than the control group (Liu 
Xiangfu, CaiYun, 1997). 

2.2.4 Duration 

The experiment lasts one academic year which includes two semesters (the first semester is from September 2006 to 
January 2007; the second semester is from March 2007 to July 2007). 

2.2.5 Instrument 

The instruments used in the experiment are two tests and one questionnaire. Test One (pre-test) was held at the 
beginning of the experiment, including a written test (listening comprehension, English and American culture, 
translation of short passages) and an oral one. Test Two (post-test) was held at the end of the academic year, in the 
same form as the pre-test, but with more difficulty. The questionnaire was administered during class time following 
the test. 

2.2.6 Procedures 

2.2.6.1 Pre-test 

The experiment began with the pre-test, which was like a placement test to judge the students’ English level upon 
entering the College. It was held two weeks after the students’ admission. The purpose of pre-test is to verify that the 
English proficiency level of the two groups is very close. The test consists of a written one (Listening 
Comprehension, Cultural knowledge of Britain and America, Translation of Short Passages) and an oral one. The 
average score of each item for both the experimental class and control class and the corresponding U values are 
shown in Table 1 (The full score of each item is 100). 

Insert Table 1. 

As is shown in Table 1, t value scores were calculated: 12.96, 13.12, 12.71 and 12.99. When P = 0.05, these t values 
are not significant, suggesting that the two groups of students are very close to each other in terms of English 
proficiency before participating in the immersion teaching model experiment. We also tested the average score of 
each item using U value and the calculation results show that when P = 0.05, critical U value is 45, but all the U 
values listed in Table 1 are higher than the critical value (45). This also shows that the two groups are very close to 
each other in English proficiency, which is consistent with t value. 

2.2.6.2 Mobilization Activities 

Before carrying out the immersion teaching model, we had an information campaign: making the 70 students 
become well aware of the significance and objective of the experiment. We particularly placed emphasis on the 
challenges and options to the experimental group and stressed that their performance would influence our course 
design for English majors and their learning efficiency. They were stimulated to know the importance of their own 
roles both in and out of class; otherwise all efforts came to zero. Meanwhile, we specified general requirements and 
learning tasks for each term in the experimental period. 
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2.2.6.3 Immersion Teaching Model vs. Traditional Teaching Model 

During the experiment, two classes were offered the same courses ( Basic English 1&2, Spoken English 1&2, 
English listening 1&2, English reading 1&2, The Society and Culture of Major English-speaking Countries: An 
introduction—Book 1&2 ) using the same teaching materials but different teaching models. 

To the experimental group, the immersion teaching model was frilly applied. All the subjects were taught by foreign 
teachers who were from English-speaking countries such as America, Britain, Canada and Australia. Classroom 
activities included teachers’ lecturing, students’ answering questions, pair or group discussion, and reading, etc. 
Under the guidance of the principles of the immersion teaching model, all classroom activities were conducted in the 
English language. Furthermore, foreign teachers did not have to explain vocabulary, grammar, sentence patterns 
clearly. In this way, the students were immersed in a full-time English-learning environment. It needs to be pointed 
out here that a young assistant was provided in this group to help the students having difficulty in communicating 
with their foreign teachers in the first semester. In order to ensure a good after-class environment for English 
communication, the young assistant acted as a supervisor for the experimental group and the team leaders also took 
responsibilities for the same task. Furthermore, mutual supervision between and among the teams took place at the 
same time. Language laboratories and broadcasting stations on campus also played a major role in creating an 
environment for English communication. The extracurricular activities included simulated shopping, English drama, 
English songs, English speech contest, English variety show and reading British and American classic works, etc. In 
this way, an almost full-time English-speaking environment for the experimental class was established and 
guaranteed. 

For the control group, all the courses were taught by experienced and responsible Chinese English teachers using the 
traditional teaching approach. That is to say, all courses were taught mainly with the grammar-translation method 
combined with the audio-lingual approach, which is currently practiced by most English teachers of NCIST and also 
in the majority of colleges and universities in China. This model requires clear explanation of grammar and 
vocabulary in detail, together with the practice of listening, speaking, reading and writing skills. Meanwhile, the 
control group was also motivated to participate in active English learning by means of completing assignment, going 
to English Corners and taking part in English knowledge and skill competitions and so on. 

2.2.7 Post-test 

The experimental class practiced the immersion teaching model, while the control group practiced the traditional 
teaching mode. After one year’s teaching and learning, another test (post-test) was conducted. The experimental 
statistics and the average U values of the experimental class and control class of students are as follows: 

Insert Table 2 Here 

2.2.8 Interpretation of the Result 

From Table 2 we can see the U value of Listening Comprehension and Culture of Britain and America is 65.21 and 
64.85 respectively; both are higher than the critical value. The U value of Translation of Passages and Oral Test is 
44.38 and 40.33, which are lower than the critical value. It is clear that there is a big difference in Translation of 
Passages and Oral Test between the two groups. That is to say, after one year’s teaching and learning, students of the 
experimental group improve faster in their practical ability to use language. 

From the result of the pre-test and post-test, we can see our experimental hypotheses can be proved. Through the 
immersion teaching model, the students’ language ability, especially their practical ability to use the language, can 
be greatly enhanced. The experiment proves to be successful, which means the immersion teaching model is also 
feasible in college English teaching. 

2.2.9 Questionnaire Findings 

To further highlight the contrastive effects on students’ second language learning between the immersion and the 
traditional models, we carried out a questionnaire survey following the previous two tests among the two groups of 
students, which was designed for self-evaluation of their own English learning. In other words, the objective of the 
questionnaire is to get a clear knowledge of the effect and efficiency of the immersion teaching model. In answering 
the questionnaire, the students were supposed to retrospect their learning experience in the past academic year. 

On the one hand, this questionnaire survey is conducted to investigate the different effects on the two groups of 
students in terms of study motivation, study method and oral communication skills. On the other hand, it can also 
help explain certain facts. Consequently, in addition to verifying the positive effect of the immersion approach 
shown by the two tests, the findings of the questionnaire survey also explained the following facts: first, the 
immersion teaching model has a sound effect on students’ after-class learning. They took the initiative to read 
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related materials after class and purposefully tried to expand their reading so as to be well adapted to the immersion 
teaching model; second, the experimental students were less involved in examination-oriented learning and as a 
result their motivation for using and communicating in English became stronger than the control group students. 
Third, the immersion teaching mode helped learners become more confident. The experimental groups showed 
higher self-evaluation than the control group, esp. in the aspect of oral communication skills. The experimental 
group generally proved to have more confidence than the control group in the practical use of English, the English 
way of thinking as well as giving satisfactory answers in English. Additionally, the students of the experimental 
group were more optimistic about their prospect than the control group. 

3. Conclusion 

This experiment suggests that the immersion teaching model is not only feasible in college English teaching but also 
very effective in improving learning efficiency. The positive effect of the immersion teaching model is reflected by 
the fact that the productive skills of the students can be well developed and quickly improved, while the traditional 
model does not have the same effect. Meanwhile, through the immersion model, the students’ enthusiasm and 
confidence in learning English can be markedly enhanced. 

In short, this research has proved the feasibility of the immersion teaching mode in institutions of higher learning. 
However, we should attach much importance to constructing a better English teaching and learning environment 
which is a prerequisite and guarantee for the application of the immersion teaching mode. Meanwhile, the quality of 
English teachers should be constantly improved, since the immersion teaching model cannot be successful without 
highly qualified teachers. It is our conviction that we can truly make wise use of the immersion teaching mode so 
long as we can cultivate the students’ interest in learning a foreign language and enable them to become fully aware 
of the importance of communicative competence, and so long as our English teachers and administrative personnel 
are willing to update their educational concept and exert every effort to try new ways to improve the quality of 
college English teaching. 
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Table 1. The average score of each item and U value in Pre-test 



Written test 

Oral 

test 

Listening 

comprehension 

culture of Great Britain and 
America 

translation of short 

passages 

experimental 

class 

74.40 

75.52 

72.62 

75.59 

Control class 

73.89 

75.13 

71.31 

75.01 

t value(p=0.05) 

12.96 

13.12 

12.71 

12.99 

U value 

65.46 

66.71 

68.32 

64.33 


(p = 0.05 level, the critical value U is 45) 


Table 2. the average score of each item and U value in Post-test 



Written test 

Oral 

test 

Listening 

comprehension 

Culture of Britain and 
America 

Translation of 

passages 

Experimental 

group 

76.50 

76.08 

77.58 

82.86 

Control group 

74.45 

75.96 

74.78 

76.52 

U value 

65.21 

64.85 

44.38 

40.33 


(p = 0.05 level, the critical value U is 50) 
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Abstract 

It is by now well established that teacher characteristics play a major role in the way high stakes tests impact 
education (Alderson and Hamp-Lyons 1996). What remains an open question, however, is specifying the type of 
characteristics that have the potential to moderate the backwash effects of tests. This study was designed to isolate 
the effects of teachers’ assessment literacy in moderating the washback effects of summative tests in the EFL 
context of Iran. A test of assessment literacy and a questionnaire on English language teaching practices were 
administered to 53 EFL secondary school teachers. Results show that teachers are suffering from a poor knowledge 
base in assessment and no matter how assessment literate they are; they do tailor their English teaching and testing 
to the demands of external tests. However, more assessment literate EFL teachers seem to be more likely to include 
non-washback practices in their English teaching. The implications for teacher training and teachers’ professional 
development programs are then discussed. 

Key words: washback, impact, assessment literacy 

1. Introduction 

The notion of test washback has been around for a long time. Educators seem to have always been aware of the 
effects of tests on educational programs, teachers, and learners. Only very recently, however, have scholars been 
urged to put the notion to serious empirical investigations. Findings now suggest that the washback phenomenon 
does exist but it is too complex to lend itself to simple experimental designs (Wall and Alderson 1993, Alderson and 
Hamp-Lyons 1996). It is also evident that numerous personal and contextual factors interact in shaping the impacts 
of tests on classroom processes (Watanabe 2004, Alderson and Hamp-Lyons 1996,). Among the involved personal 
factors, teachers’ characteristics have been found to go a long way in predicting the way education is aligned with 
the demands of high stakes tests. This study was designed to highlight the effects of one important teacher 
characteristics rarely addressed in language assessment in general and in washback studies in particular: teachers’ 
assessment literacy. Unlike test washback, teacher assessment literacy has only recently found its way into the 
agenda of language assessment community. In this study, we embarked on studying the level of assessment literacy 
among Iranian EFL teachers and the manner in which it may moderate the negative backwash of summative, 
external examinations. To this end two major research questions were posed: 

1) What is the extent of assessment literacy among Iranian EFL teachers in Iran secondary schools? 

2) To what extent the washback effect of final examinations is moderated through teachers’ rate of assessment 
literacy? 

2. Literature Review 

Testing is an integral part of every educational system. That is why evaluation is one of the necessary modules of 
each curriculum development program. Tests are originally designed to be at the service of learning and teaching 
(Davies 1990). Optimally, tests are to evaluate the outcomes of education a posteriori. However, tests have come to 
act beyond the original role they were given. With the advent of external tests, a reversal of roles has occurred in 
educational programs so as sometimes it is teaching which is at the service of testing. This role reversal is to be 
better conceived of as a continuum. Washback refers to the extent tests outmaneuver teaching (Hughes 1989, 
Shohamy, Donitsa-Schmidt, & Ferman 1996). A family of similar terms, with slight differences in shades of 
meaning, have emerged which all have in common a concern for the undesired or desired influences of tests on 
learning, teaching, and society. In general education, the terms impact, curriculum alignment, and consequential 
validity are better known than the terms washback and backwash which are frequently used in language education 
(Hamp-Lyons 1997). Studies in general education have it that high stakes tests lead teachers to waste the 


156 


ISSN 1916-4742 E-ISSN1916-4750 




www.ccsenet.org/elt 


English Language Teaching 


Vol. 4, No. 1; March 2011 


instructional time, ignore higher-order thinking skills, teach to the test, and limit their focus to those learning areas 
and tasks that are likely to appear on the tests (Hamp-Lyons 1997, Mehrens and Kaminskyl989, Fredricksen and 
Collins 1989, Cheng and Curtis 2004). In language education, Alderson and Wall (1993) pioneered the journey of 
classically studying test washback. They studied the nature of English language teaching classes in a wide range of 
contexts in Sri Lanka prior to the introduction of an innovative test into the educational system of the country. They 
also set out to study the same processes after the test had been put into operation for a few years. They put forward 
15 hypotheses accounting for all the possible aspects and dimensions of test washback (Watanabe 2004). An 
important finding of that study was that tests do affect what teachers teach but they are less likely to affect how they 
teach. They also found that teachers’ approach to assessing their students’ achievement is to a great extent a function 
of test washback, that is, teachers’ assessment practices are one of those areas subject to the immediate impacts of 
tests. Shohamy, Donitsa-Schmidt and Ferman (1996) studied the washback of two tests in Israel: a test of Arabic and 
one of English twice with the interval of a few years. It was found that in addition to features in test design, there are 
other factors at play in determining the washback of a test. In particular, they found that over time the effects of a 
test may disappear or increase. Moreover, the prestige of each language in each society contributes to the enhancing 
or weakening of test consequences. Unlike the test of English, It was found that the test of Arabic failed to generate 
considerable washback because of the unpopularity of the Arabic language in Israel. Alderson and Hamp-Lyons 
(1996) investigated the washback of TOEFL and found lots of variations among teachers with regard to the degree 
and type of test washback. They commented that “our study shows clearly that the TOEFL affects both what and 
how [italics originaljteachers teach, but the effect is not the same in degree or in kind from teacher to 
teacher”(p.295). Teachers’ educational background, past learning experiences, beliefs about effective teaching and 
learning, and their attribution orientations have also been found to affect the washback phenomenon (Watanabe 
1996, 2004). In summary, teachers’ characteristics are of crucial importance in determining the extent and nature of 
test consequences. Of particular importance among such teacher features is their knowledge base in assessment or 
what has come to be known as assessment literacy. 

Assessment literacy (henceforth AL) is the ability to understand, analyze and apply information on student 
performance to improve instruction (Falsgraf 2005, p.6). AL is vitally important for good teaching. Eckhout, Davis, 
Mickelson, and Goodburn (2005, p. 3) argue that good teaching is actually impossible in the absence of good 
assessment. Despite its crucial role in shaping the quality of teaching there is evidence that teachers universally 
suffer from poor assessment literacy (Volante and Fazio 2007). Several reasons have been suggested which conspire 
to deny teachers of an optimal level of AL. A commonly-held belief is that if an individual knows how to teach a 
language, he or she knows how to assess the product and the process of language learning as well (Spolskyl978, 
cited in Jafarpour 2003). Such common mistaken beliefs contribute negatively to further neglect of teachers’ 
knowledge base in language assessment. The intimidating appearance of assessment, its being the only branch of 
applied linguistics inundated with numbers and figures is yet another reason (Bridley 2001). Traditional delivery 
approaches to teaching assessment courses both in in-service and pre-service programs have also resulted in 
teachers’ alienation from assessment issues (Inbar-Lourie 2008). Given the attested role of teachers’ factors 
mediating the negative and positive washback of examinations, the specific part AL may play in that interaction 
remains a lacuna in the language assessment literature. 

3. Method 

3.1 Participants 

53 EFL secondary school teachers (13 women and 40 men ) sampled from three major provinces of the country 
participated in the study. As the study was done during summer when schools are closed and teachers can hardly be 
accessed, a random sampling approach was not feasible. Eight of the participant teachers held M.A.s in TEFL and 
the rest held B.A.s in either translation, English literature, or TEFL. The relatively smaller sample of female teachers 
is due to the fact that generally there are fewer female English teachers in the country so that, despite the 
illegitimacy of male teachers teaching in female schools in Iran, coeducation is against the country’s laws, some 
schools have to employ male English teachers (boys and girls study in separate schools and are taught by the 
same-gender teachers). The other reason is that women can hardly manage to get engaged in teaching summer 
courses because of their traditional prime responsibilities at home. 

3.2 Instruments 

Plake and James (1993) test of assessment literacy was used to measure EFL teachers’ knowledge base in 
assessment. The test consists of a set of 35 multiple choice questions followed by some self-assessment questions 
relating to the extent teachers see themselves proficient in English, prepared for language teaching, and competent in 
language assessment. The former section includes items on various aspects of assessment including knowledge of 
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the terminology of assessment, assessment ethics, assessment results inteipretation, and procedures of designing 
tests according to the local needs. Test items have a problem-solving nature so that to answer them correctly one 
needs to enjoy a solid, well-integrated knowledge of relevant assessment issues. Using Cronback’s alpha, the 
reliability index of the test was .79. The second instrument was a Likert-scale questionnaire covering the most 
common or possible language teaching practices in the EFL context of Iran. To develop the questionnaire, the 
researchers drew upon their own experience of EFL teaching in Iran secondary schools, in-depth interviews with 
three EFL teachers which were recorded and transcribed, and the related literature. A pool of 50 items was piloted 
with 10 teachers. Analyzing their responses, several modifications were made to the original version: twenty items 
were dropped out on the grounds of being of little relevance, the wording of some was modified, and many items 
were reordered to enhance the validity of responses. The questionnaire in its final format consisted of two categories 
of items. One group of items related to those language teaching activities which were perceived to be directly 
targeted at helping students pass the summative tests given at the end of the year. We call this group of items 
‘washback items’ (items 1, 2, 7, 17, 18, 19, 20, 21, 24, 27, 29). The second category of items covered language 
teaching activities or tasks that were not primarily designed to help learners increase their scores on final tests. 
Rather, they were teaching practices in line with current perspectives on communicative language teaching that 
could potentially help students increase their communicative language ability irrespective of the type of examination 
administered. Teacher-constructed achievement tests were another source of data. Teachers were asked to submit 
copies of tests that they had recently developed. The requirement was that the submitted tests should be entirely 
made on their own without copying from a bank of items or borrowing test papers from a colleague. 

3.3 Procedures 

Teachers were met in the private language schools where they were teaching or the colleges where they were 
studying during summer. They were asked to respond to the questionnaire items before they start doing the 
assessment literacy test. One of the researchers stayed with the participants during the time they were answering the 
test and the questionnaire to answer their possible questions. As to the teacher-made achievement tests, only 21 were 
collected and the rest of teachers failed to submit tests for reasons that will be discussed in the next sections. The AL 
test papers and the likert-scale questionnaires were scored and the teacher-made tests were analyzed and compared 
against the framework of final exams. The two categories of questions were scored separately so that each teacher 
had two scores on the questionnaire: a score for bad language teaching practices or negative washback practices and 
another score for good language teaching practices or positive and non-washback influences. As it is usually the 
case with backwash studies, descriptive statistics and correlations were used to analyze the data. 

4. Results 

Table 1 shows the descriptive statistics for teachers’ self-assessments of their overall preparation for teaching 
English, for assessing their students’ performance, and their general proficiency in English. As the table illustrates 
teachers had a higher evaluation of their own preparedness for teaching English (Mean = 3.14) compared to their 
self-assessment of their English language proficiency or their evaluation of their own readiness to do proper 
language assessments. It seems that Iranian EFL teachers are, to some extent, aware of their own weak assessment 
background as their self-assessment of their own AL yielded the minimum score compared to the other two scores 
(Mean = 2.93). Generally, however, teachers seem to believe that they are well ‘prepared’ for English teaching as 
well as its assessment requirements given that the maximum possible rating was four on these two scales 
(preparedness for teaching and preparedness for doing assessment). 

Insert Table 1 Here 

Table 2 illustrates the descriptive statistics for participants’ scores on the AL measure. The maximum possible score 
is 35. As the table demonstrates, the maximum score obtained is 20 which shows that the most assessment literate 
teacher had barely managed to answer more than half the items right. The mean is ten out of 35 which is indicative 
of a very poor performance on the AL measure. Overall, then, it is obvious that teachers do suffer from a poor 
assessment knowledge base. This is counter to the way they rated their own competence in language assessment, a 
mean score of 2.93 out of four. 

Insert Table 2Here 

It was indicated in the previous section that less than half the participant teachers agreed to submit a copy of their 
self-made tests. When asked about the reasons for their decline to submit a copy of their self-made tests, some 
responded that they never took their time to construct new tests and the rest pointed that they administered borrowed 
test papers from colleagues. By itself, it is an alarming finding because given the importance currently assigned to 
the role of assessment in promoting language learning, it is unfortunate to find such an indifferent attitude towards 
assessment among EFL teachers. 
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Normally, the English final examinations in Iranian secondary schools start with a spelling test followed by a set of 
fill-in-the-blanks vocabulary items. A few matching synonym or antonym items then follow. Other sections include 
multiple choice discrete-point grammatical items, a few matching items related to conversations included in 
textbooks-they are called language functions- and the test ends with a passage for reading comprehension followed 
by two true/false items and a few open- ended or multiple choice items. The analysis of the achievement tests made 
by teachers showed severe washback of final examinations as even minor deviations from the format and content of 
final tests were rare. This was a confirmation of what teachers indicated in the interviews. One of them stated that: 
“I may sometimes increase the number of vocabulary items and reduce the grammatical ones because I personally 
do not believe in the usefulness of grammar, however, I am careful to observe the overall framework of the final 
examinations”. The obvious washback effect on teacher-made tests may be, at least partially, induced by the fact 
that teachers are externally monitored for their conformity with the external test formats. Another teacher said, “I 
know of no other method of testing, 1 think final examinations are perfect and standard ways of assessing English 
knowledge of students”. The above quote clearly shows that Iranian EFL teachers are not aware of the many options 
they have at their disposal for assessing the communicative language ability of their students. This lack of 
knowledge results in failure to practice which in turn contributes to the further deterioration of teachers’ competence 
in language assessment. 

The analysis of teachers’ responses on the questionnaire shed further light on the way they are impacted by final 
tests as well as the interaction of assessment literacy and test washback. Descriptive statistics for teachers’ responses 
on the washback items speaks to the heavy influence of final examinations on the way teachers teach English 
(mean=40, SD=5.25). The maximum possible score was 55 as there were 11 items. This high score clearly shows 
that Teachers with various levels of AL tailored their teaching to the demands of final examinations. To determine 
the extent to which teachers with various AL scores adjust their teaching to the requirements of final tests, a 
correlation was run between teachers’ AL scores and their score on the 11 washback items. We expected a reverse 
correlation between teachers’ assessment literacy scores and the degree of their exam-like language teaching. The 
result did not, however, confirm our expectation since no negative correlation was found between AL scores and the 
scores of the washback items of the questionnaire. This lack of correlation indicates that regardless of their 
assessment literacy level, all teachers do experience the washback of final examinations and the amount of the 
influence is enormous as indicated by descriptive statistics (the mean was 40.11 out of maximally possible 55). We 
ran a second correlation between teachers’ AL scores and their scores on the non-washback items of the 
questionnaire. Although the correlation coefficient was not very high, it was, however, significant at .05. 

Insert Table 3 Here 

This correlation shows that better assessment literate teachers, although equally try to meet the demands of final 
exams- they are to the same extent subject to the negative washback of final tests- they do, however, add variety to 
their language teaching through employing language teaching strategies which are more supported by current 
theories of communicative language teaching and learning. Giving more attention to higher-order thinking processes 
in teaching reading, employing more pair-work and group-work communicative activities, and focusing more on 
productive tasks in writing and speaking are typical of such non-washback activities. One possible explanation for 
the association between teachers’ AL scores and their scores on the non-washback scale of the questionnaire is that 
it is teachers’ English proficiency not their assessment literacy which explains the correlation or possiblely a 
combination of both English proficiency and assessment literacy. We ran a correlation between teachers’ 
self-assessment scores of their own English proficiency and their scores on the AL measure. No meaningful 
relationship was revealed, giving us more trust, but not absolute trust, in the power of assessment literacy in 
affecting EFL teaching quality. It should also be mentioned that teachers, on average, assessed their own English 
proficiency as ‘good’. The above two points remove part of the doubt for attributing either teachers’ assessment 
literacy scores or communicative language teaching practices to higher English language proficiency. Nevertheless, 
in the absence of more solid experimental evidence catering for all possible factors it is hard to think of any casual 
relationships between teachers’ competence in language assessment and the nature of their English teaching. 
Moreover, there seemed to be a negative correlation between teaching experience and knowledge base in assessment; 
the correlation coefficient was not significant however. It would be plausible to think that under the influence of 
standardized tests the more experienced teachers have lost a larger portion of their assessment literacy being 
exposed for a longer time to the paralyzing effects of tests. 

5. Discussion and Conclusion 

We can posit with a moderate degree of certitude that Iranian EFL teachers’ knowledge base in assessment is far 
below the satisfactory level. On closer inspection of teachers’ responses, it was found that more than on-third (19) of 
teachers could not recognize the appropriate definition of ‘reliability’ in a multiple choice item (the second item in 
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the AL measure). There were even a couple of teachers who failed to even answer one single item correctly. The 
teacher with maximum AL score only slightly answered more than half the items right. All these pieces of evidence 
show that Iranian EFL teachers of secondary schools have a very poor knowledge base in language assessment. 

The way teachers conduct their assessments is in close alignment with final standardized examinations. The large 
number of teachers who do not construct tests at all is by itself another sad finding. It indicates that they are 
minimally sensitive to or aware of the importance of assessment in EFL teaching. This lack of practice in 
communicative assessments and following the fixed, traditional forms of assessment, in the long run, results in 
further deterioration of their assessment competence. 

Teachers with various competencies in assessment model their teaching after the pattern of final examinations in 
trying to help their students succeed on such tests. Pragmatically, there seems to be nothing wrong with the approach 
in so far as the success of both students and teachers is judged on the basis of students’ performance on such tests. 
More assessment literate teachers, however, seem to, in addition to trying to help their students achieve success on 
final tests, invest more in original language learning tasks not targeted directly at raising students’ scores on 
summative tests. An interesting finding though it may seem, we do stay cautious as to passing with any certitude any 
casual relationship between the quality of EFL teaching and teachers’ knowledge base in assessment. Both internal 
to the study and external factors may account for the association of higher scores on the questionnaire and the AL 
measure. We may legitimately argue that the significant correlation is likely to be a function of both English 
proficiency and assessment literacy or the former alone since the test was not translated into Persian( although this 
was not verified in our analysis of teachers’ scores on the self-assessment scale of proficiency and their scores on 
the AL) . Nevertheless, it is still plausible to think that teachers with poor language proficiency are less likely to 
employ teaching activities which demand higher levels of communicative language ability. In addition, although 
self-assessments are praised for their potential merits of inducing positive washback like learner autonomy, their 
validity as true measures of proficiency is difficult to sustain given the numerous factors that may affect one’s 
judgment of their ability. Had we measured participant teachers’ English proficiency through a test like IELTS or 
TOEFL, rather than through their own self-ratings, we would have been in a better position to comment on their 
language proficiency as well as their language assessment literacy and teaching practices. As washback studies, by 
their very nature, do not lend themselves to experimental designs to isolate the effects of separate factors, unless 
further evidence accrues, such findings should be approached with extreme caution. 

The extremely low assessment literacy of EFL teachers observed calls for a thorough overhauling of both 
pre-service and in-service assessment training courses. As Inbar-Lourie (2008) asserts the traditional delivery 
approach to teaching assessment courses have proved to be futile. Assessment courses, both in-service and 
pre-service, should follow more down-to-earth approaches focusing more on assessment practices than on 
theoretical issues. Teachers’ fear of assessmaent should be allayed through involving them in serious cooperative 
assessment practices. Iranian EFL teachers are never invited to practice their expertise in any serious large-scale 
assessment project. Moreover, easing teachers’ access to local, national and international testing and assessment 
journals can contribute to the improvement of teachers’ AL. Finally, modifying the structure of traditional, 
discrete-point final examinations and employing more communicative language testing can benefit the EFL teaching 
both by accomplishing positive washback and promoting teachers’ knowledge base in language assessment. 
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Table 1. Descriptive statistics for teachers’ self-assessment ratings 



N 

Min 

Max 

Mean 

Std. Deviation 

Preparation for teaching 

49 

1 

4 

3.14 

.61 

Preparation for assessment 

47 

1 

4 

2.93 

.76 

English proficiency 

52 

1 

5 

3.03 

.79 


Table 2. Participants’ scores on the AL test. 


N 

Minimum 

Maximum 

Mean 

Standard Deviation 

53 

.00 

20.00 

10.0377 

4.80369 


Table 3. The correlation between participants’ AL scores and their socres on the non-washback items of the 
questionnaire. 



AL 

Non-washback 

Assessment literacy (AL) 

Pearson Correlation 

1 

.328* 


Sig. (2-tailed) 


.0117 


N 

53 

53 
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